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I. INTRODUCTION 


A. PROBLEM STATEMENT 

Currently, analysis of non-stationary spectra is an important tool in the field of signal 
processing. The goal of such an analysis is to obtain a time history of a signal's 
frequency content and track statistical changes. Such insight into a signal's temporal 
behavior 1s extremely useful in signal detection, identification, and synthesis. These 
applications are of primary importance to the fields of speech processing, acoustic 
processing, and seismic analysis among others. 

Classical methods for analyzing spectra are derived from the Fourier transform. 
Fourier analysis through the Fast Fourier Transform (FFT) is the tool of choice for 
stationary analysis due to its ease of implementation. However, the assumption of 
stationarity precludes any notion of time dependence. If the incoming signal contains 
am plitude or frequency components which vary with time, the changes are masked. 

Fourier analysis extends to time-frequency analysis through the assumption that local 
stationarity exists. This is the basis for the spectrogram. The data is segmented through 
a window to a length chosen to ensure stationarity. The Fourier transform of the 
windowed data allows the determination of the signal's energy distribution as a function 
of frequency at a given time. Sliding the window along the data results in the generation 
of a time-frequency surface. The primary drawback to the spectrogram 1s that frequency 
resolution varies directly with the window length. Temporal resolution is obtained at the 
expense of frequency resolution. The spectrogram is inadequate for rapidly varying 
spectra. 

An alternative method of analyzing time-varying spectra lies in the use of a joint 
time-frequency distribution. A time-frequency distribution represents a function 
describing the energy density of a signal simultaneously in the time and frequency 


domains. At first glance, the distribution appears to represent a statistical artifice. 


Ideally, a joint distribution has the same properties as a joint density function and may be 
manipulated in a like manner. For example, the distribution of energy with frequency 
can be obtained by integrating over time. However, time-frequency distributions do not 
have to be strictly valid in a statistical sense to be of use. Several useful distributions to 
be encountered later illustrate this feature. As long as the distribution obeys certain 
desirable properties that allow consistent interpretation of the power spectrum it serves a 
valid purpose. 

Many time-frequency distributions have been proposed, the majority of which can be 
unified under a class of distributions proposed by Cohen in 1966. The Cohen class of 
distributions is a bilinear transformation characterized by the use of an arbitrary kernel 
function. Depending on the choice of kernel function, various distributions have been 
proposed, each with desirable and undesirable properties. Well known distributions of 
this class include Wigner- Ville, Choi-Williams, Born-Jordan, and the ZAM 
transformation. The spectrogram can also be considered to be a member of Cohen's 
class. 

Although Cohen's class of distributions has many important characteristics, the 
bilinear structure of the class can produce a combination of auto-terms and undesirable 
cross-terms when the signal is composed of multiple frequency components. The 
presence of cross-terms obscures spectral features necessary for signal recognition or 
classification. The degree of interference present depends on the kernel employed in a 
particular distribution. Choosing a kernel demands tradeoffs. While cross-terms need to 
be suppressed, the properties desired of a reasonable time-frequency distribution should 
be retained. Typically a compromise is reached by sacrificing select properties to obtain 
reasonable suppression. The resulting distribution has limitations but typically performs 


well for certain classes of signals. 


B. OBJECTIVES 

The goal of this thesis is to examine the performance and limitations of selected 
kernels used with the Cohen class of distributions. The methods examined here are those 
of Wigner-Ville, Choi-Williams, and the ZAM kernel. Each method has its own 
personality and areas of application, yet each represents a particular set of compromises. 
The Wigner-Ville method excels when used with single-component linear FM signals but 
is unable to suppress the cross-terms that arise with spectrally complex signals. Both the 
Choi-Williams and ZAM kernels suppress cross-terms to some degree but do not offer 
good performance for a broad class of signals. However, individuals signals have their 
own particular characteristics which may be characterized in the ambiguity domain. A 
more effective ume-frequency distribution might attempt to take advantage of the signal's 
characteristics. One promising method employing a Gaussian signal-dependent kernel is 
examined here. By tailoring the kernel to the location of a signal's auto-terms on the 
ambiguity plane, good performance for a broad class of signals 1s possible. 

Performance of the methods above is illustrated graphically using several classes of 


synthetic analytic signals. 


IH. SPECTROGRAM 


For a band-limited, wide-sense stationary (WSS) process, the Wiener-Khinchin 
theorem [1] states that the Fourier transform of the autocorrelation function yields its 
power spectral density (PSD): 


P(f)= [Re Pat (1) 
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With a finite data set, the PSD is calculated directly from the data. The perodogram 
estimates the PSD as the magnitude squared of the Fourier transform of the data: 
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In discrete form, (2) becomes 
, ee ok 
P(A) =D amen (3) 
n=0 








where the discrete Fourier transform 1s typically calculated by an FFT algorithm. 

The periodogram can approximate a time-frequency surface by separating the data 
into contiguous blocks and processing each separately. Each block produces a spectral 
esumate. Laying the spectral lines beside one another yields an estimate of the time- 
frequency surface. Frequency resolution is directly affected by the length of the blocks 
used. Long blocks give better resolution but tend to smoothen nonstationarities. Shorter 
blocks track nonstationarities better but at the expense of frequency resolution. 

A time-frequency surface generated with the periodogram provides only a weak 
association between time and spectral behavior. The spectrogram [2] allows a direct 
association between time and spectral estimates and is a true time-frequency 
representation. The spectrogram segments data through a sliding window, centered about 


time t: 


i 2 
[x(t —te dt . 


P(t, f)= (4) 








Window duration is chosen to ensure local stationarity exists and provides a means of 
controlling the frequency resolution. The spectral estimate is both real-valued and 
positive. For discrete time, (4) becomes 
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P_(k, f) = (5) 








A measure of the spectrogram's accuracy as a time-frequency representation may be 
found by determining expressions for the instantaneous energy and energy density 
spectrum. These are found by integrating over frequency and time respectively. 


Integrating over frequency gives 


{ Ba. pdf = flxcoe w'(tT—t)dt (6) 


while integrating over time yields 


[ P.(t. fdt = [|x(o)/'|w(o - fy do. (7) 


These expressions show that the sliding window causes smearing along both the time and 
frequency directions. 

Another check of the spectrogram 1s to calculate the total signal energy, represented 
by the volume under the spectrogram. Integrating over both time and frequency results 
in 


[ fAcdraf = | flecop w?(t—1t)dtadt (8) 
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which shows that the spectrogram does not accurately represent the signal energy 


whenever the average energy of the window is not equal to unity. 


HI. GENERALIZED TIME-FREQUENCY 
REPRESENTATIONS 
(GTFR) 


A. TIME-FREQUENCY DISTRIBUTIONS 

As noted previously, the spectrogram suffers from two principal drawbacks. First, the 
time and frequency resolutions of the method are inversely related. Second, the 
assumption of local stationarity may not always be valid. In that case, the frequency 
distribution of the signal no longer represents the true PSD. 

One approach to handle non-stationary signals is to introduce a time-dependent 
correlation function into the Wiener-Khinchin theorem. In general the autocorrelation 
function may be defined as 

h+T 
Ihatn a 18) “= [x(e)x" (t+ t)dt (9) 
h 
where f, represents an arbitrary reference time and the superscript * denotes complex 


conjugate. A symmetrically specified correlation function may be developed by defining 


G G 
t=t-— d t,=t+— 10 
i 5 an 2 5 (10) 


which in turn gives 


T=t,—-t, and p= Th (11) 
Using these definitions in (9) yields a time indexed autocorrelation function: 
R,.(ty.t,) = allah aaa 
(12) 
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Using an instantaneous autocorrelation value in the Wiener-Khinchin theorem results in 


- T et 
P(t,f)= Me Vy pee ere 
(Ef) [xc \x"(t-=~)e dt (13) 


which 1s the well known Wigner- Ville distribution [3]. This distribution will be 
examined later. 

A different approach in handling nonstationary signals is to express signal energy as a 
joint function of time and frequency. The benefit of formulating a time-frequency 
distribution, 7(t,f), is that it can be endowed with properties desirable for the purpose of 
signal processing. One desirable property for a time-frequency distribution is that it 
represents a true energy distribution. An energy distribution requires three relationships 
to hold. First, the time marginal of the energy distribution represents the energy density 


spectrum: 

fra pdr=|X(fy. (14) 
Second, the frequency marginal gives the instantaneous energy: 

f Te. Aaf = x(n) . (15) 
Finally, integrating over time and frequency gives the total energy of the signal: 


I [T(, fodedf = [x(n dt = E, (16) 


As noted previously, the spectrogram represents a smeared version of the energy 
distribution. The degree of smearing depends on the window employed and on the non- 
stationarity of the signal. 

Other properties desired for a time-frequency distribution fall into two categories. 
The first are key properties without which physical interpretation of the time-frequency 
plane would be impossible. An example is shift invariance. The remaining properties, 
such as time support, are less critical and are not absolutely required for a distribution to 
be useful. In fact, not all of the properties can even be supported simultaneously. A 


listing of desired properties is as follows [4]: 


Time Shift: If the signal is shifted in time, the time-frequency distribution ts also 


shifted in time by the same amount. 


XS) 


x(t-t,) 2 T(t-t.f) il 


Frequency Shift: If the signal is shifted in frequency, the time-frequency distribution 


is also shifted in frequency by the same amount. 


X(f) > T(t, f) 


AT o So) ee Of aio) “" 
Real-valued: The time-frequency distribution is be real everywhere. 

T(t, fyeR (19) 
Non-negative: The time-frequency distribution is positive everywhere. 

TO pP20 vip (20) 
Time support: If the signal is zero at some point in time, the time-frequency 
distribution is also zero the same time. 

x(t, )=0 > T(t,, f) =0 (21) 
Frequency Support: If the signal has a spectral energy density of zero at some 
frequency, the time-frequency distribution is zero at the same frequency. 

X(f,)=0— T(t, f,) =0 (22) 


Instantaneous Frequency: The instantaneous frequency of a signal at a given point in 
time is equal to the normalized first-order moment in frequency of the time- 


frequency distribution. 
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(23) 
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TABLE I: DESIRABLE PROPERTIES FOR TIME-FREQUENCY DISTRIBUTIONS 
a 
Number Propert Number 
[ref rdt =|X(f)] 
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Total Signal Energy J T(t, f )dtdf = [lx@pP a 
P4 Time Shift ial ( typ) 17 
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Frequency Shift X(f) > T(t f) a 
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| P9 ___| Frequency Support | X(f.)=0 > TUt,f.) =0 
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Group Delay 


e Group Delay: The group delay of a signal at a given point in frequency 1s equal to 


the normalized first-order moment in time of the time-frequency distribution. 
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= 24 
t,(f) [Tepat (24) 


All of the properties discussed are summarized in Table | and denoted P1-P11 for 


later reference. 


B. COHEN'S CLASS OF DISTRIBUTIONS 
A wide variety of time-frequency distributions have been proposed, each with their 
own unique set of properties. Although the distributions proposed were all based on 


valid principles, their relationship to each other was unclear. In 1966, Cohen proposed a 


generalized phase-space distribution function from which most popular distributions can 


be derived [5]. The class of distributions 1s given by 


7 Tle TS jan @u-Or-7) 
C(t, f= | | [9(0,2)x(u+ aes (u ae dudtd@ (25) 
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where (6,7) represents an arbitrary kernel function represented in the ambiguity domain 
(Doppler-shift, time lag). Depending on the choice of kernel function in (25), literally an 
infinite number of different distributions may be generated. 

The relationship given by (25) allows identification of those properties common to all 
member distributions. For example, the bilinear structure gives rise to spurious cross- 
components with multi-component signals. However, the individual properties of a 
particular distribution are determined by its kernel function. More importantly, by 
placing constraints on the kernel function a desired set of characteristics is obtained in 
the resulting distribution. Referring back to the properties P1-P11 in Table 1, each 
imposes a different constraint on the kernel function. Table 2 shows the constraints 
necessary to achieve each property [4][6]. Since some of the constraints are mutually 
exclusive, choosing a kernel function ultimately involves trade-offs. Cohen [6] gives 
derivations of selected constraints. 

In addition to (25), Cohen's generalized distribution can be expressed in four 
additional ways [4]. First, the generalized distribution represents the two-dimensional 
convolution of the Wigner-Ville distribution with the kernel function represented in the 
time-frequency domain: 

C(t, f) = WG, f)** OU, f). (26) 
Second, the generalized distribution can be written as the double Fourier transform of the 
product between the kernel expressed in the ambiguity domain and the ambiguity 


function: 


TABLE 2: KERNEL CONSTRAINTS TO OBTAIN DESIRABLE PROPERTIES 


9(8,0) =1 V6 oe 
9(0,t) =1 Vt 
- 03 










Total Signal Ener (0,0) =1 | i aa aa 
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C(t, f) = J | (0,1) A(0,t)e 7") drd6 (27) 
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where 
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A(@,T)= Jx(u+—a Comes aca (28) 


The product of the kernel and the ambiguity function is known as the generalized 
ambiguity function and represents the characteristic function for (25). In the temporal 
domain, the kernel function becomes 


w(t,T) = | 6(0,t)e7?™"d8. (29) 


=o 
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Then, another expression for the generalized distribution 1s the one-dimensional Fourier 
transform of the convolution between the temporal domain kernel and the instantaneous 
autocorrelation: 


OS ESC w(t,t) je" dt (30) 
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where 
(age ii 
Be) ee a (i>). (31) 


Finally, in the spectral domain, the kernel function becomes 
(f,0) = [o(-O,t)e Pdr. (32) 


With this kernel, the last expression for the generalized distribution is the one- 
dimensional inverse Fourier transform of the convolution between the spectral domain 


kernel and the instantaneous spectral autocorrelation: 


ao 


C(t f) = [[Rux (f.0)* VC, Je?" de (33) 
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where 
6... Q 
Ry, (f,8) = es J~ 5? (34) 


Figure | illustrates the relationships between each of the representations shown above. 
While Cohen's generalized distribution can be expressed in equivalently in the four 


domains, some domains may more readily lend themselves to implementation. 


C. THE AMBIGUITY FUNCTION 
The ambiguity function (28) represents a frequency-indexed autocorrelation function 


[7]. When multiplied by the kernel function, the generalized ambiguity function results: 


A’(8,t) = 0(0,T)A(0,T). (35) 


W(t,f)**O(t,f) 


val oe 
Rxx(f, 8) *¥(f,8) Rxx(t,T) *W(bT) 
Ge a 
0(9,t)A(8,7) 


Legend 


F 1: Fourier transform w.r.t. 
first variable 

F2: Fourier transform w.r.t. 
second variable 


Figure 1. Relationships between Cohen's generalized distribution in various domains 


As a distribution, the generalized ambiguity function is not useful. However, as the 


characteristic equation of Cohen's generalized distribution it is an important tool in 


determining the properties of the distribution. 


Referring back to (28), the ambiguity function is bilinear with respect to the signal. 


As such, it exhibits undesirable cross-terms with multicomponent signals. Consider the 


following signal expressed as the sum of its individual components: 


N 
xf) = EAC) 
k=l 


(36) 


Substituting (36) into the ambiguity function results in auto-terms and cross-terms 


between the components: 


N 
A(O,T) = 9A, ,;(8,T)+ > Y Agt om (8,7) 
i=] 


lem 


aulo— terms cross~—terms 


where 


‘P aes tT. 
A....(0,T)= |x(ut+—)x, (u-—)e? du 
sisi(87)= Jx(ut5)x, (U5) 
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and 


(37) 


(38) 
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Oe | x(ut Dx, (us Jed (39) 


Ignoring the kernel function for now, when the ambiguity function is transformed to the 
time-frequency domain using (27), the cross-terms persist as spurious artifacts in the 
time-frequency distribution of the signal. Signal interpretation and identification become 
more difficult. 

An approach to remove the degradation in the time-frequency domain by cross-terms 
is to eliminate them in the ambiguity domain. Flandrin [8] has noted that the auto-terms 
(38) are located primarily about the origin of the ambiguity plane. Cross-terms (39) tend 
to be positioned away from the origin at a distance related to the distance between the 
signal components involved on the time-frequency plane. Recalling (35), the above 
suggests that the kernel function should apply a large weight near the origin to promote 
auto-terms and a small weight away from the origin to suppress cross-terms. Therefore, 
to reduce interference in the time-frequency domain, the kernel function should be a two- 
dimensional lowpass filter in the ambiguity domain. 

Two simple examples illustrate the structure of signals in the ambiguity domain. First 
consider a signal composed of two complex sinusoids: 

x(t) wpe 2B tan e2 (40) 
Subsututing (40) into the ambiguity function results in 


A(0,t) a Ac - Aces )5(0) + A, Ae "*2*8 (6 + (f, — f, y) a 
+A, Ae!" "88+ (f, - f,)) 
where the first term represents the auto-term and the last two, cross-terms. As (41) 
indicates, only the auto-term passes through the origin and lies directly on T axis. The 


Cross-terms parallel the Tt axis but never come close to the origin. Figure 2 shows a plot 


of the ambiguity function for 


zt J pra 


x(n)= e) 64” +e ° (42) 
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Figure 2. Ambiguity function of Equation (42) 


which demonstrates the behavior expected by (41). 


Consider next a multicomponent signal composed of linear chirps. Figure 3 shows 
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Figure 3. Ambiguity function of Equation (43) 
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the ambiguity function for 

x(n) = 46a) 5 49) (43) 
where the time period was chosen to ensure the chirps do not cross in the time-frequency 
plane. The auto-terms are seen to cross diagonally through the origin while the cross- 
terms lie far away. However, if the components of the signal touch or cross on the time- 
frequency plane, cross-terms of the ambiguity function will appear at the origin. Figure 


4 shows the ambiguity function for 
oq 2) 2 ) S$l2-a 40 \ 2 
x(n) = 40 ae), ge) (44) 


and clearly shows cross-terms passing through the origin. 

Reflectung upon Figures 2-4, the shape of the kernel's passband must be chosen with 
care to do the best job at removing cross-terms while preserving auto-terms. Consider, 
for example, the situation portrayed in Figure 5. Figure 5(a) shows the ambiguity 
function of a signal composed of two linear chirps. The signal is to be filtered in the 


ambiguity domain to remove cross-terms using a kernel possessing an elliptic passband. 





Figure 4. Ambiguity function of equation (44) 
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In Figure 5(b), the major axis of the kernel is aligned along the Tt-axis, removing all of 
the cross-terms. The resulting time-frequency distribution will be free of cross-terms and 
show good auto-term resolution. But, 1f the major axis of the kernel is changed to the 6- 
axis as in Figure 5(c), the cross-terms are not fully filtered and the auto-terms are 
partially suppressed. The resulting time-frequency distribution will show cross-terms as 
well as smoothed auto-terms. For the best time-frequency representation from Cohen's 
generalized distribution, the kemel function must take into account the signal's structure 
on the ambiguity plane. Of course, if cross-terms occur at the origin, as in Figure 4, 
removing them is very difficult and the signal's time-frequency distribution will not be 


totally satisfactory. 
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Figure 5. 


(a) Ambiguity function of two linear chirps 


(b) Filtering of the ambiguity function with an elliptic kernel (shaded 
region) oriented along the T-axis 


(c) Filtering of the ambiguity function with an elliptic kernel oriented 
along the 9-axis 


IV. FIXED KERNEL DISTRIBUTIONS 


A. WIGNER-VILLE DISTRIBUTION 

The Wigner distribution [9] was orginally proposed in the field of quantum 
thermodynamics in 1932 as a correction to the behavior of atoms at low temperatures. 
Ville [10] reintroduced the distribution in 1948 for use in signal processing and 
demonstrated its use with analytic functions. More recently, the Wigner- Ville 
Distribution (WD) has been studied extensively by Boashash [11] and Claasen and 
Mecklenbraucker [12][{13]. 

The WD is derived from Cohen's generalized distribution using the kernel function 

6(0,T) =1. (45) 


Substituting (45) into (25) gives 
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WD(t, f) = | | | x(ut 5) (u selec © \dtdud® 
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s | | x(ut =)e (u i 1 8(u — t)dtdu (46) 


= 00 


a J x(t Le aerial dt. 


As shown earlier, the WD may be interpreted as the Fourier transform of an 

instantaneous, symmetrical correlation estimate through the Wiener-Khinchin theorem. 
The WD also enjoys a simple relationship with the ambiguity function (28). Since the 
kernel function is equal to unity, (27) indicates the WD and the ambiguity function are 


related by a two-dimensional Fourier transform: 


Pe, 
WD(t, f) A(8,7). (47) 


Fo. 


Due to its simple structure, the WD is a convenient means for calculating the ambiguity 
function. 

In contrast to the periodogram, the WD possesses high temporal and frequency 
resolution simultaneously. Referring back to Table 2, the WD's kernel function ensures 
that the distribution obeys all of the properties listed with two exceptions. A non- 
negative distribution 1s not assured, a factor which limits the WD's usefulness as an 
energy distribution. Also, finite time support is violated in some cases. For example, if a 
signal contains short duration null intervals, the WD will not be zero during these 
intervals. Figure 6 shows the WD of an on-off keyed complex sinusoid and indicates the 
region over which the signal is actually zero. 

The main drawback to the WD are the cross-terms produced between frequency 
components due to its bilinear structure. Since the kernel function represents an all-pass 
filter in the ambiguity domain, the cross-terms are not attenuated in the time-frequency 


plane. Besides complicating the time-frequency distribution, most of its negative values 


Signal is zero 


in this region 





SO 60 


Figure 6. Example of WD indicating signal energy where the 
signal is actually zero. 
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arise from the cross-terms. For a signal containing N frequency components, there are 


ea 3 
2) (N-2)!2! Fa) 


cross-terms. As an example, recall (40), a signal composed of two complex sinusoids. 


Substituting (40) into (46) yields 


WD(t, f) = A 8(f — ff) + AZ5(f - f) 


+2A,A,5(f - u ot )cos((f, — f,)£). 


(49) 





As (49) shows, and 1s generally true, the cross-term appears at the arithmetic mean of the 
frequencies of the two components involved. For this example, the magnitude of the 
cross-term 1s twice as great as the auto-terms if the auto-terms are of the same magnitude. 
Figure 7 shows the WD of the complex signal (42). 

The WD may be expressed in discrete form [13] as 


WD(n, f) =2 ¥ x(n +k)e"(n— ker. (50) 


k=-co 


Unlike the spectra of discrete time signals, (50) is periodic in f with period 7 instead of 





Figure 7. WD of two complex sinusoids, Equation (42) 
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2m. Since the spectra of real valued signals is non-zero in the interval [0,27], sampling at 
the Nyquist rate leads to aliasing of the spectrum. To avoid this, the signal must be 
sampled at twice the Nyquist rate, 

f.24f.35 (51) 
or the discrete time signal must be interpolated by a factor of two. 

An alternative that allows sampling at the Nyquist rate 1s to use only analytic signals 
which have a non-zero spectrum only in the interval [0,7]. Every real valued signal x(n) 
has an associated complex valued analytic signal x(n) such that 

x,(n) = Re[x(n)]. (52) 
The analytic signal is obtained from the real valued signal through the relationship [14] 

x(n) =x,(n)+ jH[x,(n)] (53) 
where H[-] represents the Hilbert transform. A faster method is to take the Discrete 
Fournier Transform (DFT) of x(n), zero out the negative frequencies, multiply the 
positive frequencies by two, and take the inverse DFT. An additional benefit of using 
analytic signals is reduction of cross-terms. Since only positive frequencies are present, 
the cross-terms arising from interaction between positive and negative frequency 
components are avoided. 

In actual practice, the data used in the discrete WD is typically windowed to 
smoothen the spectral estimate. The resulting distribution is known as the pseudo 
Wigner distribution (PWD): 

PWD(n, f) =2 ¥ x(n +k)x*(n-k)w(k)w(-k)e (54) 


k=—oo 


Two-dimensional smoothing functions have also been developed to suppress cross-terms 


and obtain positive distributions [6]. 


ys 


B. EXPONENTIAL DISTRIBUTION 
The exponential distribution (ED) was proposed by Choi and Williams [15], based on 


the kernel function 


Q7 77 


Opp (8,t)=e ° (5) 
where 6 1S a positive scaling factor. Referring to Figure 8, (55) decays with increasing 
Ot and acts as a two-dimensional lowpass filter. Accordingly, the ED demonstrates 
suppressed cross-terms while retaining most of the advantages of the WD. However, the 
ED does not obey the time and frequency support properties, and does not guarantee a 
positive distribution. 

Substituting the kernel function (55) into Cohen's generalized distribution (25), an 


expression for the ED is obtained: 





Figure 8. Contour plot of the ED's kernel function in the 
ambiguity domain. 
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The ED can be interpreted as the Fourier transform of a time-indexed autocorrelation 


estimate 


ED(t, f) = [KG,teP™ de (57) 


=D 


where 


-1)’ 
K = = a 
(a) — = ae re ex - rae oe yx" (u du. (58) 


The autocorrelation estimate (58) represents a time average. To preserve the signal's 





time-varying properties, the exponential term applies a larger weight when uw is close to ¢ 
and a smaller weight when they are farther apart. To preserve accuracy, the range of the 
time average 1s controlled by t. For large values of t, a wider range is used and, 
conversely, a smaller range for small values of t. 

The kernel function performs filtering in the ambiguity domain, emphasizing features 
lying close the origin and the axes. Width of the filter's passband is controlled by the 
scaling factor, o. For small values of o, the filter rolls off more sharply. Increasing o 
widens the passband and as o approaches infinity, the WD is obtained. The value of o 
also affects the autocorrelation estimate represented by (58). More averaging, and 
therefore more smoothing of the autocorrelation estimate, takes place as o is decreased. 


As oO increases, (58) approaches an instantaneous autocorrelation estimate. Choosing a 
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value for 6, then, imposes a tradeoff. Larger values of o give a sharper estimate of the 
autocorrelation function at the expense of weaker cross-term suppression. Smaller values 
tend toward the opposite. Choi and Williams have recommended that o be in the range 
0.1 to 10. 

To demonstrate the ability of the ED to suppress cross-terms, consider again (40), a 


signal composed of two complex sinusoids. Substituting (40) into (56) yields 


ED(t, f) = A; 5(f — f,) + A;8(f - fi) 


(59) 
+2A,A; cos((f, = A Ons. A » fy.) 


where 


\@) 


4n( fi -f,) 


oo ons care | 


n(f.f,, 4.0) represents an attenuation factor that reduces the amplitude of the cross-term 


NS Arf, O)= 
(60) 


by spreading it along the frequency axis. The amplitude of the cross-term at a given 
frequency f depends on two factors. First, the amplitude is inversely proportional to the 


difference f, — f,. Second, the amplitude decreases exponentially with distance away 
from the cross-term's center frequency. A smaller value of o leads to a greater spreading 


of the cross-term while increasing 6 causes (59) to approach the WD result, (49), since 
+ 
lim NF. fifo) =f -L22) (61) 


Figure 9 shows the ED of the complex signal (42). 


The ED is expressed in discrete time [15] as 
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ED(n, fo= 2 x W ee D ea TO pesca 


{T=—00o u=—0o 


(62) 





ur : 
exp{- a = rin +T)x (n+u 1) 
where W,,(t) and W,,(u) represent finite windows which slide along the time axis. 
W,, (7) is an arbitrary, Symmetrical window whose length and shape determine the 
frequency resolution of the distribution. The inner window, W,, (uw), is rectangular and 


determines the range over which the autocorrelation function is estimated. Like the WD, 


the discrete ED is periodic in f with period 47 and is typically used with analytic signals. 


C. THE CONE-SHAPED (ZAM) KERNEL 

Although the ED offers suppression of cross-terms in the time-frequency domain, 
finite time support is sacrificed. To accomplish both goals simultaneously along with 
improved frequency resolution, Zhao, Atlas, and Marks [16] have proposed the cone- 


shaped (ZAM) kernel. The resulting distribution, however, does not satisfy time or 


tie 
i} \ “ iio 
Mi tH Hi i Hie 


Figure 9. ED of two complex sinusoids, Equation (42), 
using 6 = 10 
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frequency marginal properties. 
The kernel function is derived by considering the constraints necessary to achieve the 
desired properties. Referring back to Table 2, to maintain finite-time support the kernel 


must satisfy 
[(6,t)e?™"'d0 =i(), wipe |i (63) 


Equivalently, this constraint can be expressed in the temporal domain (t,t) as 

o(1,t)=0,  |t|< Ie]. (64) 
Therefore, the region of support for the kernel function in the (t,t) plane is limited to the 
cone-shaped region indicated in Figure 10. An appropriate choice for the kernel $(¢,7) 
rests on three considerations. First, for the best temporal resolution, the kernel should 
not be a function of time, t. This requirement assures that temporal smoothing does not 
take place. Second, to smooth the autocorrelation estimate and reduce cross-terms, the 
kernel should be a low-pass filter in the t-dimension. Combining these requirements 


with (64) gives 


a “a lz] = ale| 65) 


@) otherwise 





be n 


Figure 10. The support region 
for the kernel O(¢,T) 
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where g(T) is a Suitable filter and a represents the slope of the cone subject to 2Sa<oo. 
The final consideration for the kernel is that it take the form of a lateral inhibition 
function in frequency, f, in the frequency plane (9, f). Lateral inhibition functions have 
been shown [17] to occur naturally in human vision and auditory systems, and enhance 
perception and feature detection. In signal processing, a function of this type represents a 
weighting scheme (shown in Figure 11) that enhances spectral peaks when convolved 
with the spectrum. Meeting the above also assures that the kernel will be low-pass in the 
8-dimension which aids in suppressing cross-terms. All of the above conditions can be 
satisfied by using any of the popular windowing functions for g(t). 
With a suitable choice for g(t), the kernel (65) can be expressed in the ambiguity 

domain [18] as 

(8,7) =|tsinc (6T) (7). (66) 
Substituting (66) into Cohen's generalized distribution (25), an expression for the cone- 


shaped kernel distribution (CSD) is obtained: 


frequency of interest Positive Weighting 





Negative Weighting 


cree e: 


ff 
Figure 11. Weighting requirements for lateral inhibition 
functions 
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The CSD can also be interpreted as the Fourier transform of a windowed, local 


autocorrelation function 


CSD(t, f) = | g(t) K(t, te dt (68) 


where 


(+(t/2) t x 
K(t,t) = ~)x*(u ——)du. 
(t,t) | x(Ut )x°(u su (69) 


t-(t/2) 
The interval used to estimate the local autocorrelation function reflects the cone-shaped 
region of support and allows the CSD to track signals with rapidly varying nonstationary 
behavior. 
The CSD is expressed in discrete time [16] as 
ES? y e(k)e? owe — p+k)x’*(n- p-k). (70) 
ke-L pH 
To obtain a real distribution, the length of the window function g(k) must be odd. Since 
this implementation would preclude using an efficient FFT algorithm, (70) may be 
rewritten exploiting the symmetry of the window. Summing over one side of the 
window, (70) becomes 
L 
CSD(n, f) = are a(k)R(n,k)e | (71) 
k=0 


where 


FS, 


; 0.5e(k) k=0 
(k) -| (72) 
g(k) otherwise 
and 
K 
R(n,k) = Sx(n-p40r oe (73) 


p= 
Now an even window length is possible and the FFT algorithm may be employed. 


Figure 12 shows the CSD of the complex signal (42) using a Gaussian window. 


D. GENERALIZED EXPONENTIAL DISTRIBUTION 

Boudreaux-Bartels and Papandreou [19] have noted that, in the ambiguity domain, the 
exponential kernel represents a poor filter. The kernel does not have a flat passband and 
rolls off very slowly. As a result, cross-terms near the origin are barely attenuated while 
auto-terms are distorted due to the narrow passband. Both of these problems can be 
remedied by using a lowpass filter with a flat passband and a narrow transition region to 


the stopband. 





Figure 12. CSD of two complex sinusoids, Equation 
(42), using a Gaussian window 
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In place of the ED, Boudreaux-Bartels and Papandreou have proposed the 


Generalized Exponential Distribution (GED) characterized by the kernel function 


e\ "(+ 
Geen =e a G (74) 
where WN and M are positive constants representing the kernel order and 9, and T, are 
scaling constants. Comparing Figure 13 with Figure 8, the GED kernel shows a much 


flatter passband and steeper roll off. The GED includes the ED as a special case where 


N =M=1 and 0-1; =o. Also, the GED shares all of the properties of the ED. 

The four parameters of the GED distribution completely control its shape in the 
ambiguity domain and are chosen in reference to the signal's structure in the ambiguity 
domain. The parameter's role in shaping the filter can be investigated by considering the 


one-dimensional filter 


Op(x) =e 5 (75) 
which 1s plotted in Figure 14 for various values of NV. As N increases, the passband 


flattens and the transition region narrows. The scaling factor, x,, determines the width of 


So 





Figure 13. Contour plot of the GED's kernel function in the 
ambiguity domain. 


3] 


the passband region. A set of criteria have been developed for determining the optimum 
set of parameters from a given set of constraints on the passband and stopband. 

Due to the complexity involved, the GED is implemented solely in the ambiguity 
domain. The ambiguity is first calculated and then multiplied by the kernel. Taking the 


two-dimensional FFT of the product gives the GED. 


E. REDUCED INTERFERENCE DISTRIBUTIONS 

As seen earlier, for a time-frequency distribution to have the properties listed in Table 
], its associated kernel must satisfy the constraints listed in Table 2. In addition, to 
suppress cross-terms the kernel must be a lowpass filter in the ambiguity domain. Jeong 
and Williams [4] have introduced a subset of Cohen's generalized distribution, the 
reduced interference distribution (RID), that has all of the above desirable characteristics. 
By considering the limitations above imposed on the kernel, they have also introduced a 
simple design process to produce RID kernels and thus, members of the RID class. 

The design process proposed by Jeong and Williams to design a RID kernel consists 
of three steps. 


Step 1: Determine a real-valued, primitive function h(t) such that: 





oO o.5 4 1.5 2 


x/x, 
Figure 14. Variation of the lowpass filter given by Equation (74) 
with the parameter N. 
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R1: h(t) has unit area, 
[a(nde =]. (76) 


R2: h(t) 1s symmetrical about the origin, A(t) = h(-t). 
R3: h(t) is only non-zero in the interval [-'42,Y]. 
R4: h(t) has little high frequency content so that H(8) represents a lowpass filter. 


Step 2: Take the Fourier transform of h(t), 


H(8) = [h(te?™dt. (77) 


—co 


Step 3: The RID kernel is obtained by replacing 8 in H(@) with @r, 
Q pip (9,7) = H(t). (78) 


In Step 1, the requirements R1-R3 ensure that the resulting kere] will meet the 


constraints listed in Table 2. Condition R1 implies that constraints Q1-Q3 will be met 


since 


H(O)= {nce =] (79) 


and the argument of H(8t) becomes zero whenever 8 or T are zero. Condition R2 


ensures that H(8) will be real which in turn implies that Q6 holds. Also, Q10 and Q11 


are also implied unless the derivatives fail to exist. Condition R3 ensures Q8 since 


wit) = Joe.) 


-—oo 


2 ed (80) 


{r| T 
=0 if |t|<2J¢. 
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Since a similar relationship holds for ‘¥(8,f), Q9 also holds. As the resulting kernel is not 
a function of time or frequency, Q4 and QS are satisfied by default. However, Q7 will 
not be satisfied. 

Condition R4 requires h(t) to be lowpass in the frequency domain. Then, performing 
the substitution 98t for 9 in Step 3, the resulting kernel is guaranteed to be lowpass in the 
ambiguity domain and thus suppresses cross-terms. However, meeting R4 involves a 
trade-off since suppression is purchased at the expense of lower auto-term resolution due 
to temporal smoothing. 

There are two useful methods for determining a good choice for the primitive 
function h(t). First, any of the popular windows used in spectral analysis represent 
reasonable choices as long as they are truncated in time to satisfy R3. Another method is 
to use the windowing technique found in FIR filter design. The kernel o(8,t) = H(0t) is 
first specified in the ambiguity domain. Reversing Step 3, h(t) is found from the inverse 
Fourier transform of H(8). Then, h(t) is multiplied by a rectangular window whose 
support is limited to [-%2,%] so that R3 is satisfied. 

After h(t) has been designed, the RID is derived from Cohen's generalized distribution 


(25) as 
RID ; ~ tf ] =f Oa T -j2ntf 
(npiny= | Joh — Ju Sy (use PY dudt. (81) 


The RID can be interpreted as the Fourier transform of a generalized autocorrelation 





function 
RID(t, f:h) = J K(t,t:h)e 2?" dt (82) 
where 
1 —t ‘Tan *¢ 
Kiwen)= fi ras Ds age (83) 
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Literally an infinite number of likely primitive functions, h(t), exist. For example, the 


function 


I1<¥ 


84 
otherwise, (es 


h(t) =rect(1) = ‘ 


leads to the Born-Jordan distribution [6]. Another example is the triangular function 


nom Poe I< ¥ GS 


0 otherwise 


which gives the kernel 
(8,7) = sinc’ (= (86) 


None of the previously discussed kernels qualify as RID kernels although each can be 


derived from the process above by relaxing some of the requirements for h(t). 
F. SUMMARY OF PROPERTY SUPPORT 


Table 3 allows comparison of each time-frequency distribution discussed above in 


terms of how they support the desired properties listed in Table 1. 


TABLE 3: PROPERTY COMPARISON OF TIME-FREQUENCY DISTRIBUTIONS 


Distribution _| Pi | P2 | P3 | P4 | PS | P6 | P7 | P8 | PO | PIO | Pil | 


Wigner 


cioiwatams | x | x | 


CSD (ZAM 
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V. ADAPTIVE RADIALLY-GAUSSIAN KERNEL 


A. BACKGROUND 

Even though most of the kernels presented in Chapter IV can be adjusted to give 
varying amounts of cross-term suppression, they do not actively take into account the 
nature of the signal itself. Since the positions of the auto-terms and cross-terms depend 
on the signal, fixed kernels can only be expected to give good results for a limited class 
of signals. For example, consider the ED, GED, and the RID. To preserve the time and 
frequency marginals, their kernels are constrained to be unity along the T and 6 axes. For 
a signal composed of complex sinusoids, such as presented in Figure 2, these 
distributions give good results. Generally these distributions work best with signals 
whose auto-terms closely parallel either the t or 8 axes. However, for signals whose 
auto-terms do not lie on either axis, such as the signals in Figures 3 and 4, worse results 
are obtained. Clearly, to handle a broad class of signals the kernel function must be 
made signal-dependent even at the expense of sacrificing a few desired properties in the 
resulting distribution. 

A procedure for designing signal-dependent kernels has been proposed by Baraniuk 
and Jones [20]. Their procedure consists of choosing an optimal kernel from a class of 
kernels defined by a series of constraints. Example constraints include forcing the kernel 
to be lowpass to suppress cross-terms or ensuring that the kernel preserves the time and 
frequency marginals. The optimal kernel is the one which maximizes some particular 
performance measure. Depending on the kernel constraints and the performance measure 
chosen, different optimal kernels are realized. 

Baraniuk and Jones chose as the basis for their adaptive kernel the class of radially- 


Gaussian functions, 
67 +17 


Meese (87) 


which may also be expressed in polar coordinates as 
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re 


‘ Go 
o(r,w)=e °™. (88) 
The shape of the kernel depends only on o(y), the spread function, which controls the 


spread of the kernel along a radial line oriented at angle y, where 


v = arctan =} (89) 


Clearly, optimizing the kernel consists of obtaining an optimal spread function. Figure 
15 shows an example spread function and Figure 16 shows the resulting kernel. 
Given the class of kernels above, finding the optimal kernel becomes an optimization 
problem. The optimal kernel is defined as the one whose spread function yields 
2h 


max J {A(r,y)o(r,y)| rdrdy (90) 
00 


where A(r,y) is the polar representation of the ambiguity function. Furthermore, the 


optimization 1s subject to the constraints 


r? 


~ 267(y) 


o(r,y) =e ayy 


o(y) 


S\ 


0 y T 





Figure 15. An example spread vector 
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Figure 16. Radially-Gaussian kernel corresponding to the 
spread vector shown in Figure 15 


and 


2% oc 
= | floC.w) rdrdy so, «20. (92) 
us 0 0 


Substituting (91) into (92) and recognizing that since the ambiguity function is 
symmetric about the ongin, the spread function only has to be determined over the range 


[O,x). The latter constraint simplifies to 


= o?(w)dy <a. (93) 
nt 0 


The purpose of the performance measure, (90), and the constraints (91)-(93), 1s to 
give an optimal kernel which suppresses cross-terms while the distortion of auto-terms is 
minimal. Equation (91) limits the kernel to the class of radially-Gaussian keels, which 
are lowpass in nature and thus suppress cross-terms. The constraint on kernel volume, 
(92) or (93), controls the trade-off between cross-term suppression and auto-term 


smearing. Increasing the volume raises the probability that auto-terms are passed by the 
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kernel unattenuated while also raising the probability that some cross-terms pass. 
Decreasing the volume performs the opposite. The recommended range for @ has been 
given as 

l<a<5 (94) 
where the lower bound arises from the volume of the spectrogram kernel. Fixing the 
upper bound consists of finding the best compromise between smearing the auto-term of 
a simple Gaussian signal and passing the majority of its energy. Appendix B shows the 


derivations for the lower and upper bounds. 


B. IMPLEMENTATION 
The first step in implementing a solution for the optimal kernel 1s to rewrite the 
performance measure, (90), and the constraints, (91 and (93), in the discrete domain. 


Sampling the radially-Gaussian kernel on a polar grid gives 


__(pd,)? 
o,(p.q)=e ey) (95) 


pe Oe’ =. g=9,....Q-1, 


where p and q represent the radial and angular indices, and A, and A, are the 


corresponding step sizes. The discrete spread function is a positive vector consisting of 
samples from the spread function, 
5, = a(gA,}. (96) 
Using the polar grid defined in (95), the optimal discrete kernel is the one whose spread 
vector yields 
Q-1 P-I ; 
ne AA, 2 2 PIA, (p.4)o,(P.9) (97) 


subject to the constraints 
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: (98) 
and 


aN 
ae <a, a20. (99) 
Dine 


Calculating the performance measure, (97), requires that the ambiguity function be 
sampled on a polar grid, A,(p,q). Two approaches can be taken. First, the polar-sampled 
ambiguity function can be calculated directly as shown in [21]. Alternately, the polar 
samples can be interpolated from a rectangularly-sampled ambiguity function. The 
interpolation 1s performed by first defining a polar grid. At each point (r,y) on the grid, 
the point is converted back into equivalent rectangular coordinates. Next, the four 
nearest neighbors bounding the point are found and bilinear interpolation [22] is used to 
estimate the ambiguity function at the desired point. For greater accuracy, the closest 
three out of the four bounding points can be used to perform two successive linear 
interpolations {21] [22]. Due to the symmetry of the ambiguity function, only the upper 
half of the ambiguity plane need be sampled. 

Restating the optimization problem, the goal is to find the spread vector 


=|, we oe | which maximizes the function 


f(o)=NA LDA, (p.9)0,(p.9)) 


g=0 p=l 


(100) 


_{pA,J’ 


= KS 5 A, DP, 9) omnes 
q=0 p=l 
subject to the constraint (99). An appropriate method for solving this problem is the 
steepest ascent algorithm, 


o(k +1) =o0(k)+uVF(k), (101) 


where the present guess is updated in the direction of the gradient 


40) 





Vf(k)= = ang | (102) 


The step size, [t, is chosen to ensure to ensure the most rapid convergence without being 
so large that the process becomes unstable. For (100), the elements of the gradient vector 
are 


(pa,)° 


of oan 
d6,(k) Pas (P.9)| q=0,....Q-1. (103) 





The algonthm is initialized with 





mal - 1 


o(0) = (104) 


which represents a circularly symmetric kernel of volume a. 
The steepest ascent algorithm does not include the volume constraint (99). Since the 
gradient is always positive, the volume of the kernel increases with each iteration. To 


keep the volume constant at &, the spread vector is scaled after each iteration by 





271A 


o(k+1)<— o(k +1) A, o(k+1) 


(105) 
This operation forces the kernel volume to & without changing the shape of the spread 
vector. 

Once the optimal spread vector is found, the optimal radially-Gaussian kermel is 
generated in rectangular coordinates using (87). Since only samples of the spread vector 
are available, interpolation is necessary to give a smooth kernel. Typically a simple 
cubic spline gives excellent results. Then the time-frequency distribution is given by a 
two-dimensional DFT of the generalized ambiguity function, which in turn is the product 


of the optimal kernel and the rectangularly-sampled ambiguity function. Figure 17 
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Figure 17. The optimal kernel generated for the complex 
signal given by Equation (42) 


shows the optimal kernel found for the complex signal (42) and Figure 18 shows the 
resulting time-frequency distribution. 

The adaptive radially-Gaussian kernel demonstrates excellent results for a broad class 
of signals but has some drawbacks. First, this technique is computationally much more 
expensive than the fixed kernels shown in Chapter Four. Convergence to an optimal 
spread vector is slow; typically, about thirty iterations are needed. Also, interpolating to 
find the polar-sampled ambiguity function and later generating the optimal kernel are 
time-consuming. The second drawback 1s the sacrifice of desirable properties in the 
resulting time-frequency distribution. Of the properties listed in Table 1, the optimal 
kernel] only guarantees preservation of signal energy, shift invariance, and a real-valued 
distribution. Baraniuk has shown how to extend the optimization procedure outlined 
above to include additional constraints on the kernel such that marginal and time support 


properties are guaranteed by the optimal kernel [21]. 
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Figure 18. Time-frequency distribution generated from 
the optimal kernel for the complex signal 
given by Equation (42) 
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VI. COMPARISON OF TIME-FREQUENCY 
DISTRIBUTIONS 


A. EXPERIMENTAL ANALYSIS 
This section compares the performance of the fixed and adaptive kernel time- 
frequency distributions using several classes of synthetic analytic signals. The intent is to 
identify any additional benefits or drawbacks for each specific distribution, such as 
resolution, and demonstrate how the distributions obey the properties listed in Table 1. 
Distributions considered here are: 
e Wigner-Ville 
e Choi-Williams 
« ZAM 
e Signal dependent distribution using the Adaptive Radially-Gaussian Kernel 
Seven test signals are used to evaluate each distribution. Each signal consists of 512 
points and is analytic. In each case, the time-frequency surface is built using 64-point 
DFTs and is displayed as either a contour or mesh plot, depending on which best displays 
the surface features. To keep the plot size manageable, the time window was moved in 
steps of eight data points, resulting in a 64 x64 surface. The test signals are: 


1. Two-component analytic sinusoid where the components are close in frequency, 


adety | rp 


2. Frequency shift keyed signal, 


x(n)=e 


Brees 

tab l<n<128 
me) 

x(n)= ta} 129<n< 384, 
euile 
a 385<n<512 





3. Mono-component FM linear chirp, 
: Saul 
paleniiersts 


x(n)=e : 


4. Two component signal composed ot two parallel, FM linear chirps, 





ip +2 sh 2 1 ~ 
x(n)=e \% 45127 4 9° \e 64512 


b 


5. Mono-component FM quadratic chirp, 


ae (4) 
j2q —+—| — 
64 64\512 


x(n)=e , 


6. Multi-component signal composed of two FM quadratic chirps and an analytic 
sinusoid, 


Le ee Saf cn \* 95 
64 64\512 64 64\512 
+e +e 7. 


x(n) =e 


7. Noisy signal composed of two FM linear chirps crossing in the time-frequency 
plane with an SNR of 3 GB, 


{(afn .»_{ Sl2=—n 49 \ 2_ 
x(n) = sei alae) + se 0 Kes) +1(n). 


B. TEST SIGNAL ONE 
Test signal 1 consists of two analytic sinusoids, spaced very closely in frequency, and 
is used to demonstrate the frequency resolution of the different distnbutions. The 


Wigner distribution, Figure 19(a), only weakly suggests the presence of two frequencies. 
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Oscillatory modulation due to the cross-term is imposed on the auto-terms. With the 
exponential distribution, Figure 19(b), the cross-terms is smoothed out slightly and the 
auto-terms are more prominent. The presence of two distinct frequency components is 
clearly visible using a o of 1. Lower values cause the auto-terms to mere due to 
smoothing while larger values give the same result as the Wigner distribution. The ZAM 
distribution, Figure 20(a), shows only a single smoothed, oscillatory frequency 
component and fails to resolve the sinusoids. Finally, the optimal distribution, Figure 
20(b), gives a very sharp spectrum indicating the presence of three frequency 
components. These are the auto-terms and a smoothed cross-term. The optimal kernel 
does not completely remove the cross-term due to its close proximity to the auto-terms 


on the ambiguity plane. 


C. TEST SIGNAL TWO 

Test signal 2 is a frequency shift keyed signal and is used to demonstrate how the 
distributions differ in frequency support. The Wigner distribution, Figure 21(a), shows 
violation of frequency support as well as cross-terms between each frequency transition. 
This effect can be minimized by reducing the size of the window but at the expense of 
frequency resolution. For the exponential distribution, Figure 21(b), the cross-terms 
have been eliminated and the leakage of spectral energy to other frequency bins is much 
improved over the Wigner distribution. The ZAM distribution, Figure 22(a), gives the 
best results as expected due to its emphasis on preserving time and frequency support. 
Spectral leakage is minimal; only a small widening of the peaks is observed at the 
beginning and end of each spectral component. Finally, the optimal distribution gives 
very disappointing results. While the distribution has no evident cross-terms, the surface 
is covered by artifacts due to the optimal kernel's lack of time and frequency support. 


These artifacts can be reduced by decreasing the volume of the kernel. 
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Figure 19. Wigner and exponential distributions for test signal 1 
(a) Wigner distribution 
(b) Exponential distribution 
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(b) 


Figure 20. ZAM and optimal distributions for test signal 1 
(a) ZAM distribution 
(b) Optimal distribution 
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Figure 21. Wigner and exponential distributions for test signal 2 
(a) Wigner distribution 
(b) Exponential distribution 
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Figure 22. ZAM and optimal distributions for test signal 2 


(a) ZAM distribution 
(b) Optimal distribution 
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D. TEST SIGNAL THREE 

Test signal 3 consists of a single FM linear chirp. All of the distributions give 
excellent results and track the instantaneous frequency of the signal accurately. For large 
values of 6, the Wigner distribution, Figure 23(a), and the exponential distribution, 
Figure 23(b), give virtually identical results. Reducing o smoothes the chirp in the 
exponential distribution. The ZAM distribution, Figure 24(a), gives very sharp results. 
Lastly, the optimal distribution, Figure 24(b), gives a narrow, smoothed chirp but with 
declining magnitude near the beginning and end of the chirp. Increasing the volume of 


the optimal kernel decreases the latter effect. 


E. TEST SIGNAL FOUR 

Test signal 4 consists of two parallel, FM linear chirps. The Wigner distribution, 
Figure 25(a), shows the expected cross-term midway between the auto-terms. Compared 
to the auto-terms, the cross-term is more oscillatory and of greater magnitude. At the 
cost of smoothed auto-terms, the exponential distribution, Figure 25(b), is able to greatly 
reduce the cross-term by smearing its energy across the region between the chirps. Here, 
a o of five sufficed to give good results. Both the ZAM distribution, Figure 26(a), and 
the optimal distribution, Figure 26(b), removed the cross-terms entirely while retaining 
sharp auto-terms. Again, the optimal distribution demonstrates a slow buildup and decay 


at the beginning and end of the chirps. 


F. TEST SIGNAL FIVE 

Test signal five consists of a single FM quadratic chirp whose instantaneous 
frequency changes rapidly. All of the distributions performed well and tracked the 
instantaneous frequency accurately. The Wigner distribution, Figure 27(a), gives a 
narrow, oscillatory representation that sharpens at higher instantaneous frequencies. 


Similar results are obtained with the exponential distribution, Figure 27(b), although the 
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Figure 23. Wigner and exponential distributions for test signal 3 
(a) Wigner distribution 
(b) Exponential distribution 
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Figure 24. ZAM and optimal distributions for test signal 3 
(a) ZAM distribution of test signal 3 
(b) Optimal distribution of test signal 3 
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Figure 25. Wigner and exponential distributions for test signal 4 
(a) Wigner distribution 
(b) Exponential distribution 
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Figure 26. ZAM and optimal distributions for test signal 4 
(a) ZAM distribution 
(b) Optimal distribution 


JD 


ridge representing the chirp is slightly smoother. The ZAM distribution, Figure 28(a), 
gives a sharp representation but the magnitude of the ridge decreases with increasing 
instantaneous frequency. Apparently, the Gaussian window used with the cone-shaped 
region of support tends to attenuate signal energy when the signal is highly 
nonstationary. Finally, the optimal distribution, Figure 28(b), gives a smoother 
representation than the other distributions. Like the ZAM distribution, the ridge 


decreases in magnitude with instantaneous frequency but to a smaller extent. 


G. TEST SIGNAL SIX 

Test signal 6 is a complex multi-component signal with both nonstationary and 
stationary components. These components include two FM quadratic chirps and a 
complex sinusoid. As expected, the Wigner distribution, Figure 29(a), produces a 
complicated spectrum due to the presence of cross-terms between the three signal 
components. The cross-terms are similar in magnitude to the auto-terms, making 
interpretation difficult. Going to the exponential distribution, Figure 29(b), improves the 
distribution at the cost of some smoothing. Still, the distribution shows that a small 
number of artifacts persist, especially where the quadratic chirps cross. The ZAM 
distribution, Figure 30(a), gives the best results. All of the cross-terms are suppressed 
with little apparent smoothing. As mentioned earlier, the magnitude of the quadratic 
chirps decreases with increased instantaneous frequency. Also, the ZAM distribution 
totally suppresses the crossover of the quadratic chirps. The optimal distribution, Figure 
30(b), gives very disappointing results. While the sinusoid shows up strongly, the 
quadratic chirps are smeared and inconsistent. Evidently, the optimal kernel favored the 
sinusoid over the chirps in the ambiguity domain and thus partially suppresses the chirps. 
Increasing the kernel volume leads to less suppression but greater leakage of cross-terms 
as Figure 30(b) indicates. This indicates that finer sampling of the ambiguity function 1s 


needed for spectrally dynamic signals to avoid competition between components. 
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Figure 27. Wigner and exponential distributions for test signal 5 
(a) Wigner distnbution 
(b) Exponential distnbution 
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Figure 28. ZAM and optimal distributions for test signal 5 
(a) ZAM distribution 
(b) Optimal distribution 
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Figure 29. Wigner and exponential distributions for test signal 6 
(a) Wigner distribution 
(b) Exponential distribution 
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(b) Optimal distribution 


Figure 30. ZAM and optimal distributions for test signal 6 
(a) ZAM distribution 


H. TEST SIGNAL SEVEN 

Test signal 7 consists of two FM linear chirps, which cross on the time-frequency 
plane, embedded in Gaussian white noise. The SNR is 3 dB. Except for the Wigner 
distribution, Figure 31(a), all of the distributions suppress the noise so the spectral 
features of the signal are evident. The Wigner distribution is unrevealing. Using ao of 
five, the exponential distribution, Figure 31(b), removes the noise sufficiently that the 
chirps are clearly visible. Both the ZAM distribution, Figure 32(a), and the optimal 
distribution, Figure 32(b), do a superior job of suppressing noise. Once again, the ZAM 
distribution appears to suppress the signal near the crossing of the chirps. For the 
optimal distribution, the kernel volume controls the rejection of the noise power. Figure 


32(b) used a volume of two; a lower value tends to attenuate the auto-terms. 
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Figure 31. Wigner and exponential distributions for test signal 7 
(a) Wigner distribution 
(b) Exponential distnbution 
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Figure 32. ZAM and optimal distributions for test signal 7 
(a) ZAM distribution 
(b) Optimal distribution 
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VI. RECOMMENDATIONS AND CONCLUSIONS 


In this thesis, the ability of time-frequency distributions to represent the time-varying 
spectral characteristics of nonstationary signals has been examined. Working with 
Cohen's generalized class of distributions (25), the relationship of the kernel's properties 
to those of the resulting distribution was shown. Of these, the most important is the 
ability to suppress cross-terms arising from the bilinear structure of (25). If the spectral 
components of the signal are obscured on the time-frequency plane by cross-terms, the 
remaining properties listed in Table 1 become unimportant. One way to suppress cross- 
terms is via a two-dimensional lowpass kemel in the ambiguity domain. Chapter Four 
showed various kernels and their resulting distributions. However, better performance 
for certain classes of signals can be realized by using a lowpass kernel that adapts to the 
signal's structure in the ambiguity domain as shown in Chapter Five. 

Examining the results shown in Chapter Six for various synthetic analytic signals, the 
Wigner distribution, the most widely known example of Cohen's generalized distribution, 
suffers from cross-terms. For multicomponent or noisy signals, very poor results are 
obtained. The exponential distribution represents a distinct improvement but trades off 
smoothing of auto-terms for removing of cross-terms. Both of these distributions also do 
not offer time or frequency support (see Table 3.) The ZAM distribution performed well 
for all of the signals shown in Chapter Six and in addition had the ability to display the 
spectral features of signals embedded in noise. 

The optimal kernel implemented here shows great promise but needs a few alterations 
for the best results. For the synthetic signals shown in Chapter Six, the optimal kernel 
was able to remove cross-terms while retaining narrow spectral peaks in the time- 
frequency plane. The optimal kernel was superior in resolving the spectral features of a 
noisy, multicomponent signal. However, disappointing results were obtained for the 


FSK and multicomponent signals. As implemented, the optimal kernel makes no attempt 
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to Satisfy the time and frequency support properties or the marginal support properties. 
These properties should be included in the optimization routine as additional constraints 
and their effects examined. The poor results shown for the multicomponent signal can 
probably be alleviated through finer sampling of the ambiguity function. 

In the future, the RID class of kernels should be examined further to understand the 
implications in choosing different windows. The RID potentially offers the best 
performance of any fixed kernel. For the adaptive kernel, work needs to be focused on 
finding a less expensive method for obtaining an optimal kernel. At the present, 
determining the optimal distribution as compared to the ZAM distribution takes an 
amount of time which is two orders of magnitude larger. One possible approach is to 
base the spread vector on the amount of energy present at each sample angle and scale to 
give the desired kernel volume. In this manner, the expensive optimization step can be 


skipped. 
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APPENDIX A. MATLAB SOURCE CODE 


The MATLAB code used in generating the distributions examined in Chapter Six is 
listed below. All data is assumed to be analytic. Real data should first be translated into 


its associated analytic signal by using the techniques discussed in Chapter Four. 


1. Wigner-Ville Distribution 


function PS = wvd(data,winlen,step, begin, theend) 


% PS = wvd(data,winlen,step,begin, theend) 
% 
% ‘wvd.m' returns the Wigner-Ville time-frequency distribution 
for the input data sequence. Window length and time step size 
%$ are determined by the user but the window length should be a 
% power of two. By default the entire data sequence is used but 
% user may specify specific intervals within the data by using 
* 'begin' and ‘end’. 
% 
$ data: input data sequence 
% winlen: window length 
$ step: time step size 
* begin: desired starting point within data 
% theend: desired ending point within data 
[m,n] = size(data); 
LEM Sen 
Gata = data.'; 
end 


datalen = length(data); 


% use user specified starting and ending points if present 
start = 1; 
finish = datalen; 
Lf nNaygine==s5 
if begin > 1 
start = begin; 
end 
if theend < datalen 
finish = theend; 
end 
end 


% initialize data spaces 

Gata = [zeros(l,winlen/2) data zeros(1,winlen/2) ]; 
plea zeros(1,winlen/2 + 1); 

ConG zeros(1,winlen) ; 

PS = zeros(1l,winlen) ; 


% calculate distribution 
index = 1; 
for n = (winlen/2)+start:step: (winlen/2)+finish 


% calulate instantaneous correlation function 
prod = data(n-winlen/2:n) .*conj (fliplr(data(n:n+twinlen/2))); 
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Seabee = (Dred con) (fLliplr(prod(2:winlen/2)))]; 
@orr(l) = 0; 

Powinaesx,:) Sift (eorr) > 

index = index + 1; 


end 


2. Exponential Distribution 


function PS = cw(data,tauwin,muwin, step, sigma) 
PS = cw(data,tauwin,muwin,step,sigma) 
Implements exponential kernel via RWED algorithm 


data: data sequence 

tauwin: outer window length 

muwin: inner window length 

step: time step size 

sigma: parameter for exponential kernel; a smaller 
value reduces the cross-terms more 


oP OP OP OP OP OP OP OP OP OP OP 


[m,n}) = size(data); 
if m > n 

Gata = data.'; 
end 


datalen = length(data); 


initialize data spaces 

Winlen = tauwin/2 + muwin/2; 

Gata = [zeros(l,winlen) data zeros(1,winlen) J; 
corr = zeros(1,tauwin) ; 

Eo = COrr;: 

mu = -muwin/2 + 1l:muwin/2 - 1; 


% Apply Choi-Williams RWED 

tine = 1; 

for n = l+winlen:step:datalen+winlen 
Piagex = 2; 
for tau = -tauwin/2 + l:tauwin/2 -1 


% calculate smoothed autocorrelation function 
if tau ~= 0 
scale = 1/(sqrt(4*pi*tau%2/sigma) ); 
gQwin = exp( -sigma*mu.%2/ (4*tau%2) ); 
filt = gwin.*data(n+tau+mu) .*conj (data(n-tau+mu) ); 


corr (index) = scale*sum(filt); 
else 

corr (index) = data(n).*conj(data(n)); 
end 


index = index + 1; 


end 
PS{line,:) = £ft (corr); 
line = line + 1; 

end 
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3. ZAM Distribution 


function PS = zam(data,winlen,step) 


% 
% PS = zam(data,winlen,step) 
% 
$ 'zam.m' returns the ZAM time-frequency distribution for 
$ the input data sequence. Window length and time step size 
% are determined by the user but each should be a power of 
% two. A Gaussian window is used to filter the 
% autocorrelation estimate where the variance is chosen 
such that the window has a value of 0.0001 at its endpoints. 
% 
% data: data sequence 
$ winlen: window length 
[m,n] = size(data); 
afm > 
data = data.'; 
end 


datalen = length(data) ; 


tinitialize data spaces 

maxlen = winlen - 1; 

data = [zeros(1,2*maxlen) data zeros(1,2*maxlen) }; 
corr = zeros(1,winlen); 

PS =SCOrEr;, 


% determine Gaussian window 

k = [O:maxlen]; 

alpha = -log(0.0001) /(2*maxlen“’2) ; 
gQwin = exp(-2*alpha*k.%2); 

Gwamil) = O25%qwinil)- 


% Apply ZAM method 
line = 1; 
for n = 1 + 2*maxlen:step:datalen + 2*maxlen 


% find local autocorrelation 
for tau = O:maxlen 
mu = [-tau:tau]; 
corr (tau+1) = sum(data(n-mu+tau) .*conj (data (n-mu- 
tau) )) *gwin(tau+l1) ; 
end 
PS (line,:) = 4*real(£tt (corr): 
line = line + 1; 


end 


4. Optimal distribution 


Finding the optimal distribution within MATLAB requires three separate programs: 


POLTERP.M - performs polar interpolation of a rectangularly sampled ambiguity 
function 
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OPTKERN.M - given the polar-sampled ambiguity function and desired kernel 
volume, determines the optimum spread vector 


KERNEN.M - produces the optimal kernel given the optimal spread vector 


Once the optimal kernel has been found, the signal's ambiguity function is multiplied by 
the kernel. Taking the two-dimensional Fourier transform of this product gives the 


optimal distribution. 


munction pol_af = polterp(af) 


fepoleat = polterp(af) 

% 

% 'polterp.m' performs a polar interpolation of a 

% rectangularly sampled ambiguity function. By default 
% a 64x64 input matrix is assumed. 

% 

% pol_af: polar sampled ambiguity function 

¢ af: rectangularly sampled ambiguity function 

ere 5S; 

ye 33 > 


Bouman = Zeros (31,31); 
Gelesi = p1/3il; 
meme at .*con) (af); 


% copy points along tau-axis 
Poteet: ) = af (cy,cx+1:64); 


% interpolate, work along concentric circles of 
% increasing radius 
pore rad = 1:31 


per = delpsi; 
POEwang = 2:31 


% calculate rectangular coordinates of current point 
hae ead Cos (psi) ; 
Ve= Faad*sin(psi); 


% convert to array coordinates 


ewe +: 3c; 
Y= cy - ¥y; 

%* find four nearest neighbors 
So le=: Lolo oie). 
yl = Elieer yy); 
X2e=) xX) 4 1; 
y2 = yl; 
x3 = X2; 

Vio y= eles 
x4 = xl; 
y4 = y3; 
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% find the AF values of the neighbors 


afl = af(yl <i 
afZ= ai (yz =2) : 
af3 = ara.) 
af4 = af(y4,x4); 


% perform 2D interpolation via two 1D interpolations 
%* determine if point lies in upper or lower triangle 
%* of neighborhood 

delx = xX - Xl; 

ely =nyee v1; 


if delx > dely % upper triangle 


Xp = X2; 
if x ~= xl 
VPuaey2 + (y - yijv tx 
else 
yp = y2; 
end 
afp = af2 + (af3 - af2)* (yp - y2); 


else % lower triangle 
yp = y3;i 
VDE Vv ~= yi 
Xp = X4545(x -" x7 yy - eye 
else 
xp = x4; 
end 


afp = af4 + (af3 - af4)*(xp - x4); 
end 
% perform final interpolation 
dl = sqrte((x - ¥1)72°> 4a] es 
d2 = Sqrei(x - Xp)°2 tay yb 2 
pol_af(ang,rad) = afl + ((afp - afl1)/(dl + d2))*dl; 
psi = psi + delpsi; 


end 


end 


function spv = optkern(pol_af,vol,mu) 
spv = optkern(pol_af,vol,mu) 


‘'spv.m' calculates the optimum spread vector using the 
steepest ascent algorithm. 


Sspv: optimal spread vector 

pol_af: polar sampled ambiguity function 
vol: desired kernel volume (typically 1-5) 
mu: step size (typically 25) 


0P oP dP OP OP OP AP OP OP 


% set up constants 


Ernue s=—/; 
false = 0; 
[maxa,maxr] = size(pol_af); 


delpsi = pi/maxa; 
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converged = false; 

mMaxiter = 1; 

mol = 0.001; 

tpol_af = pol_af/max(max(pol_af)); 


% initialized spread vector 
spv = sqrt (2*pi*vol/ (maxa*delpsi) ) *ones (maxa,1); 


% enter steepest ascent loop 
while (converged == false) & (maxiter < 30) 


% calculate gradient for each angle 
grad = zeros (maxa,1); 
for ang = 1:maxa 


% sum over each radial point 
for rad = l:maxr 
term = rad*3*pol_af (ang, rad) *exp (-rad*2/spv(ang) *2); 
grad(ang) = grad(ang) + term; 
end 
grad(ang) = delpsi*grad(ang) /spv(ang) *3; 


end 


% update spread vector 
Spv = spv + mu*grad; 


% scale spread vector to maintain constant volume 
Soe op y  SdLe (2 *pi*vol/ (delpsi*sum(spv.%*2))); 


% check for convergence 
if sum(grad) < tol 
converged = true; 
end 
maxiter = maxiter + 1; 


end 


% prepare spread vector for kernel generation 
aoe (spy spv(1));: 


function kernel = kerngen(spreadqd) 
kernel = kerngen(spreadqd) 


‘'kerngen.m' generates the optimal kernel given the optimal 
spread vector. By default, the kernel is 64x64. 


kernel: optimal kernel 
spread: optimal spread vector 


oP oP dP dP OP OP Of 


delq = pi/(length(spread) - 1); 
max_m 31; 

max_n 16; 

kernel = zeros(16,31); 


% generate the upper- half of the kernel 
oO = Ibe 
fOrs. = 0-51 


Meo 
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for j = -32:31 


% convert rect. coordinates to polar 
rsq = ((i7Sie2 + (1732) 
q = atan2(i,j)/delq + 1; 


% smoothly interpolate the spread vector for the current angle 
var = spline(1l:length(spread) ,spread,q); 

kernel(n,m) = exp(-rsq*2/(2*var“%2) ); 

m=m +1; 


end 
Nn = hor 


end 
% generate the rest of the kernel using symmetry 
kernel = [flipud(kernel)' fliplr(kernel(2:32,:))']'; 


kernel = [zeros(64,1) kernel']'; 
kernel(:,1) = zeros(64,1); 
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APPENDIX B. DERIVATIONS OF VOLUME LIMITS 
ON THE OPTIMAL KERNEL 


As stated previously, the constraint on kernel volume controls the trade-off between 
cross-term suppression and the smoothing of auto-terms. A low kernel volume increases 
the probability that cross-terms will be suppressed and that the auto-terms will be 
smoothed. Increasing the volume gives sharper auto-terms but raises the probability that 
some portion of the cross-terms pass. Baraniuk and Jones have recommended as a 
general guideline that the kernel volume be in the range [20}[21] 

= O)= 5. (106) 
A derivation of these limits follows. 
A lower limit is placed on the kernel volume by restricting the smoothing of the auto- 


terms. A reasonable limit is to allow no more smoothing than the spectrogram. The 


volume of the spectrogram kernel, o,(6,7), is given by 


= 
= J Jlos(@.2)/° dedt =|0,(0,0)f° =1 (107) 


assuming the average energy of the window is unity. Thus, the lower bound on the 
kernel volume is 
a2 1. (108) 
The upper limit is motivated by the desire to limit smoothing without passing 
significant cross-term energy. To develop a reasonable upper limit, Baraniuk and Jones 
consider the Gaussian signal 


1° 


x(t) =e? (109) 
T 4 
For this signal, the ambiguity function 1s 
oe 
AiO,t)=e + (110) 


and the optimal kernel is given by 


We 


Q7 417 


,, (Ot) =ensee (111) 





Taking the two-dimensional Fourier transform of the generalized ambiguity function 
gives the optimal time-frequency distribution, 


O =(7?4/? (i++) 


Trae (112) 


P(t, f)= 


To quantify the amount of smoothing as a function of the kernel volume, the radius r, of 


the circular contour where the auto-term has decayed to e™’ is calculated, 


pa yitd (113) 
O 


For large a, r, approaches unity indicating no smoothing but with complete passage of 
all cross-terms. Decreasing & increases r,, indicating greater smoothing and a smaller 
chance of passing cross-terms. This relationship is shown in Figure 33. For values of & 
significantly greater than one, the amount of smoothing does not change significantly but 
the chance of passing cross-terms increases rapidly. A reasonable upper bound occurs at 
the knee point of (113), or 

Ss (114) 


Contour Radius 
rN 


1.5 No Smoothing 


O 2 4 S 8 10 


Kernel Volume 


Figure 33. Plot of Equation (112) 
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